Installing Dependencies

So here in this scenario of autonomus car driving we need to use the box2d of the OpenAi Gym,so for installing those depenendencies we have to write the following conda/pip installs as mentioned below,as we will be working in 2D environment we need to install the box2d,so below is the code for installing it ↓

In [ ]:
# if you're working on anaconda prompt,then directly install these on you're conda console,or you can simply install
# them using the jupyter notebook by simply adding the "!pip install box2d",and "!pip install swig" if working from jupyter!


#conda install swig

#pip install box2d

Import Dependencies

In [1]:
import gym 
from stable_baselines3 import PPO
from stable_baselines3.common.vec_env import VecFrameStack,DummyVecEnv
from stable_baselines3.common.evaluation import evaluate_policy
import os

Test Environment

In [2]:
envname = "CarRacing-v0"
env = gym.make(envname)
C:\Users\meet\anaconda3\envs\tensorflow_ennv\lib\site-packages\gym\logger.py:30: UserWarning: WARN: Box bound precision lowered by casting to float32
  warnings.warn(colorize('%s: %s'%('WARN', msg % args), 'yellow'))
In [4]:
env.action_space
Out[4]:
Box(-1.0, 1.0, (3,), float32)
In [5]:
env.observation_space
Out[5]:
Box(0, 255, (96, 96, 3), uint8)
In [3]:
episodes = 2
for episode in range(1,episodes+1):
    state = env.reset()
    done = False
    score = 0
    while not done:
        env.render()
        action = env.action_space.sample()
        n_state,reward,done,info = env.step(action)
        score+=reward
        
    print(f'Episode : {episode},score:{score}')
    
env.close()
Track generation: 1131..1424 -> 293-tiles track
C:\Users\meet\anaconda3\envs\tensorflow_ennv\lib\site-packages\pyglet\image\codecs\wic.py:289: UserWarning: [WinError -2147417850] Cannot change thread mode after it is set
  warnings.warn(str(err))
Episode : 1,score:-31.506849315068994
Track generation: 1063..1346 -> 283-tiles track
Episode : 2,score:-29.078014184397592

vid4.gif

So as we can observe from the action space that we're having a Box space and in the observation space that we're having the RGB image,so here we will be again using the CNN model Policy and the algorithm from the algo chart we'll be using the PPO algo and the policy used will be the CNNPolicy as we're dealing with RGB images!

Train Model

In [10]:
log_path = os.path.join("Training",'Logs')
model = PPO('CnnPolicy',env,verbose = 1,tensorboard_log=log_path)
model.learn(total_timesteps = 40000)
Using cpu device
Wrapping the env with a `Monitor` wrapper
Wrapping the env in a DummyVecEnv.
Wrapping the env in a VecTransposeImage.
Track generation: 967..1212 -> 245-tiles track
retry to generate track (normal if there are not manyinstances of this message)
Track generation: 1345..1693 -> 348-tiles track
Logging to Training\Logs\PPO_8
Track generation: 1110..1391 -> 281-tiles track
Track generation: 1187..1488 -> 301-tiles track
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 1e+03    |
|    ep_rew_mean     | -54.5    |
| time/              |          |
|    fps             | 111      |
|    iterations      | 1        |
|    time_elapsed    | 18       |
|    total_timesteps | 2048     |
---------------------------------
Track generation: 1139..1428 -> 289-tiles track
Track generation: 1140..1429 -> 289-tiles track
------------------------------------------
| rollout/                |              |
|    ep_len_mean          | 1e+03        |
|    ep_rew_mean          | -51.7        |
| time/                   |              |
|    fps                  | 55           |
|    iterations           | 2            |
|    time_elapsed         | 74           |
|    total_timesteps      | 4096         |
| train/                  |              |
|    approx_kl            | 0.0019682772 |
|    clip_fraction        | 0.0731       |
|    clip_range           | 0.2          |
|    entropy_loss         | -4.25        |
|    explained_variance   | 0.00876      |
|    learning_rate        | 0.0003       |
|    loss                 | 0.23         |
|    n_updates            | 10           |
|    policy_gradient_loss | -0.00679     |
|    std                  | 0.994        |
|    value_loss           | 0.621        |
------------------------------------------
Track generation: 1062..1331 -> 269-tiles track
Track generation: 1093..1368 -> 275-tiles track
retry to generate track (normal if there are not manyinstances of this message)
Track generation: 1136..1424 -> 288-tiles track
-----------------------------------------
| rollout/                |             |
|    ep_len_mean          | 1e+03       |
|    ep_rew_mean          | -48.1       |
| time/                   |             |
|    fps                  | 36          |
|    iterations           | 3           |
|    time_elapsed         | 166         |
|    total_timesteps      | 6144        |
| train/                  |             |
|    approx_kl            | 0.014563009 |
|    clip_fraction        | 0.0756      |
|    clip_range           | 0.2         |
|    entropy_loss         | -4.23       |
|    explained_variance   | 0.0113      |
|    learning_rate        | 0.0003      |
|    loss                 | 0.209       |
|    n_updates            | 20          |
|    policy_gradient_loss | -0.00525    |
|    std                  | 0.989       |
|    value_loss           | 0.603       |
-----------------------------------------
Track generation: 1056..1324 -> 268-tiles track
Track generation: 1182..1492 -> 310-tiles track
-----------------------------------------
| rollout/                |             |
|    ep_len_mean          | 1e+03       |
|    ep_rew_mean          | -47.1       |
| time/                   |             |
|    fps                  | 29          |
|    iterations           | 4           |
|    time_elapsed         | 278         |
|    total_timesteps      | 8192        |
| train/                  |             |
|    approx_kl            | 0.010417948 |
|    clip_fraction        | 0.142       |
|    clip_range           | 0.2         |
|    entropy_loss         | -4.22       |
|    explained_variance   | 0.13        |
|    learning_rate        | 0.0003      |
|    loss                 | 0.0618      |
|    n_updates            | 30          |
|    policy_gradient_loss | -0.0157     |
|    std                  | 0.986       |
|    value_loss           | 0.525       |
-----------------------------------------
Track generation: 1096..1374 -> 278-tiles track
Track generation: 1210..1519 -> 309-tiles track
retry to generate track (normal if there are not manyinstances of this message)
Track generation: 1120..1404 -> 284-tiles track
-----------------------------------------
| rollout/                |             |
|    ep_len_mean          | 1e+03       |
|    ep_rew_mean          | -45         |
| time/                   |             |
|    fps                  | 26          |
|    iterations           | 5           |
|    time_elapsed         | 379         |
|    total_timesteps      | 10240       |
| train/                  |             |
|    approx_kl            | 0.020687403 |
|    clip_fraction        | 0.158       |
|    clip_range           | 0.2         |
|    entropy_loss         | -4.2        |
|    explained_variance   | 0.0995      |
|    learning_rate        | 0.0003      |
|    loss                 | 0.0642      |
|    n_updates            | 40          |
|    policy_gradient_loss | -0.0213     |
|    std                  | 0.976       |
|    value_loss           | 0.494       |
-----------------------------------------
Track generation: 1123..1408 -> 285-tiles track
Track generation: 1170..1470 -> 300-tiles track
retry to generate track (normal if there are not manyinstances of this message)
Track generation: 1171..1468 -> 297-tiles track
-----------------------------------------
| rollout/                |             |
|    ep_len_mean          | 1e+03       |
|    ep_rew_mean          | -44.1       |
| time/                   |             |
|    fps                  | 25          |
|    iterations           | 6           |
|    time_elapsed         | 489         |
|    total_timesteps      | 12288       |
| train/                  |             |
|    approx_kl            | 0.018048383 |
|    clip_fraction        | 0.147       |
|    clip_range           | 0.2         |
|    entropy_loss         | -4.17       |
|    explained_variance   | 0.261       |
|    learning_rate        | 0.0003      |
|    loss                 | 0.204       |
|    n_updates            | 50          |
|    policy_gradient_loss | -0.0198     |
|    std                  | 0.968       |
|    value_loss           | 0.567       |
-----------------------------------------
Track generation: 1199..1511 -> 312-tiles track
Track generation: 1074..1347 -> 273-tiles track
-----------------------------------------
| rollout/                |             |
|    ep_len_mean          | 1e+03       |
|    ep_rew_mean          | -43.4       |
| time/                   |             |
|    fps                  | 24          |
|    iterations           | 7           |
|    time_elapsed         | 584         |
|    total_timesteps      | 14336       |
| train/                  |             |
|    approx_kl            | 0.012238113 |
|    clip_fraction        | 0.162       |
|    clip_range           | 0.2         |
|    entropy_loss         | -4.14       |
|    explained_variance   | 0.234       |
|    learning_rate        | 0.0003      |
|    loss                 | 0.0534      |
|    n_updates            | 60          |
|    policy_gradient_loss | -0.0233     |
|    std                  | 0.96        |
|    value_loss           | 0.425       |
-----------------------------------------
Track generation: 1290..1617 -> 327-tiles track
Track generation: 1067..1338 -> 271-tiles track
-----------------------------------------
| rollout/                |             |
|    ep_len_mean          | 1e+03       |
|    ep_rew_mean          | -46         |
| time/                   |             |
|    fps                  | 24          |
|    iterations           | 8           |
|    time_elapsed         | 674         |
|    total_timesteps      | 16384       |
| train/                  |             |
|    approx_kl            | 0.031078117 |
|    clip_fraction        | 0.202       |
|    clip_range           | 0.2         |
|    entropy_loss         | -4.12       |
|    explained_variance   | 0.215       |
|    learning_rate        | 0.0003      |
|    loss                 | 0.0961      |
|    n_updates            | 70          |
|    policy_gradient_loss | -0.0324     |
|    std                  | 0.952       |
|    value_loss           | 0.394       |
-----------------------------------------
Track generation: 1163..1458 -> 295-tiles track
Track generation: 1030..1292 -> 262-tiles track
-----------------------------------------
| rollout/                |             |
|    ep_len_mean          | 1e+03       |
|    ep_rew_mean          | -45         |
| time/                   |             |
|    fps                  | 24          |
|    iterations           | 9           |
|    time_elapsed         | 760         |
|    total_timesteps      | 18432       |
| train/                  |             |
|    approx_kl            | 0.013630267 |
|    clip_fraction        | 0.196       |
|    clip_range           | 0.2         |
|    entropy_loss         | -4.09       |
|    explained_variance   | -0.482      |
|    learning_rate        | 0.0003      |
|    loss                 | -0.0202     |
|    n_updates            | 80          |
|    policy_gradient_loss | -0.0303     |
|    std                  | 0.937       |
|    value_loss           | 0.264       |
-----------------------------------------
Track generation: 1175..1473 -> 298-tiles track
Track generation: 1224..1534 -> 310-tiles track
----------------------------------------
| rollout/                |            |
|    ep_len_mean          | 1e+03      |
|    ep_rew_mean          | -47        |
| time/                   |            |
|    fps                  | 24         |
|    iterations           | 10         |
|    time_elapsed         | 850        |
|    total_timesteps      | 20480      |
| train/                  |            |
|    approx_kl            | 0.05662749 |
|    clip_fraction        | 0.267      |
|    clip_range           | 0.2        |
|    entropy_loss         | -4.05      |
|    explained_variance   | 0.146      |
|    learning_rate        | 0.0003     |
|    loss                 | -0.0313    |
|    n_updates            | 90         |
|    policy_gradient_loss | -0.0366    |
|    std                  | 0.929      |
|    value_loss           | 0.282      |
----------------------------------------
Track generation: 1220..1529 -> 309-tiles track
Track generation: 1323..1658 -> 335-tiles track
-----------------------------------------
| rollout/                |             |
|    ep_len_mean          | 1e+03       |
|    ep_rew_mean          | -45.7       |
| time/                   |             |
|    fps                  | 23          |
|    iterations           | 11          |
|    time_elapsed         | 939         |
|    total_timesteps      | 22528       |
| train/                  |             |
|    approx_kl            | 0.030610416 |
|    clip_fraction        | 0.208       |
|    clip_range           | 0.2         |
|    entropy_loss         | -4.01       |
|    explained_variance   | 0.303       |
|    learning_rate        | 0.0003      |
|    loss                 | -0.00841    |
|    n_updates            | 100         |
|    policy_gradient_loss | -0.0282     |
|    std                  | 0.916       |
|    value_loss           | 0.264       |
-----------------------------------------
Track generation: 1216..1531 -> 315-tiles track
Track generation: 1079..1360 -> 281-tiles track
----------------------------------------
| rollout/                |            |
|    ep_len_mean          | 1e+03      |
|    ep_rew_mean          | -48.1      |
| time/                   |            |
|    fps                  | 23         |
|    iterations           | 12         |
|    time_elapsed         | 1034       |
|    total_timesteps      | 24576      |
| train/                  |            |
|    approx_kl            | 0.04158255 |
|    clip_fraction        | 0.242      |
|    clip_range           | 0.2        |
|    entropy_loss         | -3.98      |
|    explained_variance   | 0.706      |
|    learning_rate        | 0.0003     |
|    loss                 | -0.0299    |
|    n_updates            | 110        |
|    policy_gradient_loss | -0.0345    |
|    std                  | 0.907      |
|    value_loss           | 0.271      |
----------------------------------------
Track generation: 1183..1488 -> 305-tiles track
Track generation: 1101..1381 -> 280-tiles track
-----------------------------------------
| rollout/                |             |
|    ep_len_mean          | 1e+03       |
|    ep_rew_mean          | -48.6       |
| time/                   |             |
|    fps                  | 23          |
|    iterations           | 13          |
|    time_elapsed         | 1119        |
|    total_timesteps      | 26624       |
| train/                  |             |
|    approx_kl            | 0.025478438 |
|    clip_fraction        | 0.242       |
|    clip_range           | 0.2         |
|    entropy_loss         | -3.95       |
|    explained_variance   | 0.738       |
|    learning_rate        | 0.0003      |
|    loss                 | 0.000596    |
|    n_updates            | 120         |
|    policy_gradient_loss | -0.0249     |
|    std                  | 0.898       |
|    value_loss           | 0.118       |
-----------------------------------------
Track generation: 1160..1454 -> 294-tiles track
Track generation: 1142..1432 -> 290-tiles track
-----------------------------------------
| rollout/                |             |
|    ep_len_mean          | 1e+03       |
|    ep_rew_mean          | -48.6       |
| time/                   |             |
|    fps                  | 23          |
|    iterations           | 14          |
|    time_elapsed         | 1211        |
|    total_timesteps      | 28672       |
| train/                  |             |
|    approx_kl            | 0.044771098 |
|    clip_fraction        | 0.253       |
|    clip_range           | 0.2         |
|    entropy_loss         | -3.91       |
|    explained_variance   | 0.718       |
|    learning_rate        | 0.0003      |
|    loss                 | 0.0302      |
|    n_updates            | 130         |
|    policy_gradient_loss | -0.0298     |
|    std                  | 0.885       |
|    value_loss           | 0.325       |
-----------------------------------------
Track generation: 1083..1364 -> 281-tiles track
Track generation: 1099..1377 -> 278-tiles track
-----------------------------------------
| rollout/                |             |
|    ep_len_mean          | 1e+03       |
|    ep_rew_mean          | -46.4       |
| time/                   |             |
|    fps                  | 23          |
|    iterations           | 15          |
|    time_elapsed         | 1304        |
|    total_timesteps      | 30720       |
| train/                  |             |
|    approx_kl            | 0.045483083 |
|    clip_fraction        | 0.28        |
|    clip_range           | 0.2         |
|    entropy_loss         | -3.89       |
|    explained_variance   | 0.733       |
|    learning_rate        | 0.0003      |
|    loss                 | -0.00615    |
|    n_updates            | 140         |
|    policy_gradient_loss | -0.037      |
|    std                  | 0.882       |
|    value_loss           | 0.246       |
-----------------------------------------
Track generation: 1060..1329 -> 269-tiles track
Track generation: 1115..1406 -> 291-tiles track
---------------------------------------
| rollout/                |           |
|    ep_len_mean          | 1e+03     |
|    ep_rew_mean          | -44.6     |
| time/                   |           |
|    fps                  | 23        |
|    iterations           | 16        |
|    time_elapsed         | 1398      |
|    total_timesteps      | 32768     |
| train/                  |           |
|    approx_kl            | 0.0336262 |
|    clip_fraction        | 0.254     |
|    clip_range           | 0.2       |
|    entropy_loss         | -3.86     |
|    explained_variance   | 0.707     |
|    learning_rate        | 0.0003    |
|    loss                 | 0.0507    |
|    n_updates            | 150       |
|    policy_gradient_loss | -0.0364   |
|    std                  | 0.872     |
|    value_loss           | 0.313     |
---------------------------------------
Track generation: 1096..1374 -> 278-tiles track
Track generation: 1220..1529 -> 309-tiles track
-----------------------------------------
| rollout/                |             |
|    ep_len_mean          | 1e+03       |
|    ep_rew_mean          | -44.4       |
| time/                   |             |
|    fps                  | 23          |
|    iterations           | 17          |
|    time_elapsed         | 1487        |
|    total_timesteps      | 34816       |
| train/                  |             |
|    approx_kl            | 0.026795227 |
|    clip_fraction        | 0.272       |
|    clip_range           | 0.2         |
|    entropy_loss         | -3.81       |
|    explained_variance   | 0.813       |
|    learning_rate        | 0.0003      |
|    loss                 | 0.0946      |
|    n_updates            | 160         |
|    policy_gradient_loss | -0.0366     |
|    std                  | 0.852       |
|    value_loss           | 0.443       |
-----------------------------------------
Track generation: 1013..1276 -> 263-tiles track
Track generation: 1104..1392 -> 288-tiles track
-----------------------------------------
| rollout/                |             |
|    ep_len_mean          | 1e+03       |
|    ep_rew_mean          | -41.1       |
| time/                   |             |
|    fps                  | 23          |
|    iterations           | 18          |
|    time_elapsed         | 1583        |
|    total_timesteps      | 36864       |
| train/                  |             |
|    approx_kl            | 0.050956115 |
|    clip_fraction        | 0.34        |
|    clip_range           | 0.2         |
|    entropy_loss         | -3.76       |
|    explained_variance   | 0.664       |
|    learning_rate        | 0.0003      |
|    loss                 | 0.00293     |
|    n_updates            | 170         |
|    policy_gradient_loss | -0.0419     |
|    std                  | 0.844       |
|    value_loss           | 0.307       |
-----------------------------------------
Track generation: 1021..1280 -> 259-tiles track
Track generation: 989..1240 -> 251-tiles track
-----------------------------------------
| rollout/                |             |
|    ep_len_mean          | 1e+03       |
|    ep_rew_mean          | -39.2       |
| time/                   |             |
|    fps                  | 23          |
|    iterations           | 19          |
|    time_elapsed         | 1674        |
|    total_timesteps      | 38912       |
| train/                  |             |
|    approx_kl            | 0.063627414 |
|    clip_fraction        | 0.305       |
|    clip_range           | 0.2         |
|    entropy_loss         | -3.72       |
|    explained_variance   | 0.684       |
|    learning_rate        | 0.0003      |
|    loss                 | 0.00181     |
|    n_updates            | 180         |
|    policy_gradient_loss | -0.0382     |
|    std                  | 0.833       |
|    value_loss           | 0.576       |
-----------------------------------------
Track generation: 1148..1439 -> 291-tiles track
Track generation: 1185..1494 -> 309-tiles track
-----------------------------------------
| rollout/                |             |
|    ep_len_mean          | 1e+03       |
|    ep_rew_mean          | -40.3       |
| time/                   |             |
|    fps                  | 23          |
|    iterations           | 20          |
|    time_elapsed         | 1773        |
|    total_timesteps      | 40960       |
| train/                  |             |
|    approx_kl            | 0.043249875 |
|    clip_fraction        | 0.34        |
|    clip_range           | 0.2         |
|    entropy_loss         | -3.69       |
|    explained_variance   | 0.857       |
|    learning_rate        | 0.0003      |
|    loss                 | -0.0154     |
|    n_updates            | 190         |
|    policy_gradient_loss | -0.0434     |
|    std                  | 0.824       |
|    value_loss           | 0.217       |
-----------------------------------------
Out[10]:
<stable_baselines3.ppo.ppo.PPO at 0x1dbe721f348>

All the same as we have done in case of previous environments,inorder to monitor our training and validation metrics we save out our logs,and then we can monitor our logs on the tensorboard platform by running the tensorboard --logdir=<log_directory>,here log_directory is the one where you have saved you're training and validation logs,one can browse out from there!,this will open the localhost:6000,where one can browse through!,then we train our model for 10000 time steps,for more robust model you can increase you're training time to a 1M or even more!

Save Model

In [11]:
ppo_path = os.path.join('Training','Saved Models','PPO_Driving_model')
model.save(ppo_path)

Evaluate and Test Model

In [12]:
evaluate_policy(model,env,n_eval_episodes = 10,render = True)
env.close()
C:\Users\meet\anaconda3\envs\tensorflow_ennv\lib\site-packages\stable_baselines3\common\evaluation.py:69: UserWarning: Evaluation environment is not wrapped with a ``Monitor`` wrapper. This may result in reporting modified episode lengths and rewards, if other wrappers happen to modify these. Consider wrapping environment first with ``Monitor`` wrapper.
  UserWarning,
Track generation: 1152..1444 -> 292-tiles track
Track generation: 1015..1278 -> 263-tiles track
Track generation: 976..1224 -> 248-tiles track
Track generation: 1144..1434 -> 290-tiles track
Track generation: 1255..1573 -> 318-tiles track
Track generation: 1005..1268 -> 263-tiles track
Track generation: 1025..1288 -> 263-tiles track
retry to generate track (normal if there are not manyinstances of this message)
Track generation: 1142..1432 -> 290-tiles track
Track generation: 1117..1406 -> 289-tiles track
Track generation: 1171..1468 -> 297-tiles track
Track generation: 1061..1335 -> 274-tiles track
In [9]:
## Testing the model,where it tries to predict it's own actions

episodes = 3
for episode in range(1,episodes+1):
    obs = env.reset()
    done = False
    score = 0
    while not done:
        env.render()
        actions,_ = model.predict(obs)
        obs,reward,done,info = env.step(actions)
        score+=reward
        
    print(f' Episode : {episode},Score : {score}')
    
env.close()
Track generation: 1165..1460 -> 295-tiles track
 Episode : 1,Score : -38.77551020408213
Track generation: 1284..1609 -> 325-tiles track
 Episode : 2,Score : -47.5308641975313
Track generation: 1236..1549 -> 313-tiles track
 Episode : 3,Score : -64.74358974359028

Model Result when Steps Trained = 10,000

As we can see that if we run the car for 10000 iterations,then the agent has learnt almost nothing,as we can see from the image below it makes it clear that this agent requires more time to learn the environment,other possible hack is may be one can use different policies,that also works fine,so this was the result when the agent was trained for 10000 iterations

car2.gif

Model Result when steps Trained = 40,000

vid5.gif

In [ ]: